R Ladies Melbourne, 29 Nov 2022
An illustration of reasons why we should care about working reproducibly from The Turing Way, Guide for Reproducible Research
{targets}, {renv}, {lintr}, {styler}…We’re awash in information! What we need is curation.
“Like families, tidy datasets are all alike but every messy dataset is messy in its own way” - {tidyr}: Tidy data
“Good coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread.” - The tidyverse style guide
Jump in here:
Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want the computer to do. – (Knuth 1984)
From Modern Data Book by Martin Shepperd:
Start with:
and then maybe:
Packages are the fundamental units of reproducible R code. They include reusable R functions, the documentation that describes how to use them, and sample data. - R Packages (2e)
From simple to more involved:
“The command line is a tool for talking to your operating system (e.g., macOS, Windows, etc.) using text instead of by moving around a mouse and clicking on things”
- The Command Line from Practical Data Science by Nick Eubank
Dip your toes in with:
Then dive deeper…
bash, zsh, Terminal, shellA version control system, or VCS, tracks the history of changes as people and teams collaborate on projects together. As developers make changes to the project, any earlier version of the project can be recovered at any time.
The Turing Way project illustration by Scriberia. Used under a CC-BY 4.0 licence. DOI: 10.5281/zenodo.3332807.
Some good starting points:
git, githubExperimenting with more advanced features:
From The Turing Way, Guide for Collaboration:
The Turing Way project illustration by Scriberia. Used under a CC-BY 4.0 licence. DOI: 10.5281/zenodo.3332807.
Some (often) good enough tools:
Some R specific resources:
A pipeline is a computational workflow that does statistics, analytics, or data science… A pipeline contains tasks to prepare datasets, run models, and summarize results for a business deliverable or research paper.
On my to-explore list:
Ways of capturing computational environments from The Turing Way, Guide for Reproducible Research
Possible starting points:
The Turing Way project illustration by Scriberia. Used under a CC-BY 4.0 licence. DOI: 10.5281/zenodo.3332807.
Motivation and guidance on testing:
Special mentions to:
Find me @cynthiahqy on:
Coming Soon…
R-Ladies theme for Quarto Presentations. Code available on GitHub.